"# Introduction to JumpStart - Text to Image (Inference only)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d1a9149a",
"metadata": {},
"source": [
"---\n",
"\n",
"This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
"\n",
"\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "bdc23bae",
"metadata": {},
"source": [
"***\n",
"Welcome to Amazon [SageMaker JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html)! You can use JumpStart to solve many Machine Learning tasks through one-click in SageMaker Studio, or through [SageMaker JumpStart API](https://sagemaker.readthedocs.io/en/stable/overview.html#use-prebuilt-models-with-sagemaker-jumpstart). In this demo notebook, we demonstrate how to use the JumpStart API to generate images from text using state-of-the-art Stable Diffusion models.\n",
"\n",
"Stable Diffusion is a text-to-image model that enables you to create photorealistic images from just a text prompt. A diffusion model trains by learning to remove noise that was added to a real image. This de-noising process generates a realistic image. These models can also generate images from text alone by conditioning the generation process on the text. For instance, Stable Diffusion is a latent diffusion where the model learns to recognize shapes in a pure noise image and gradually brings these shapes into focus if the shapes match the words in the input text.\n",
"\n",
"Deploying large models and running inference on models such as Stable Diffusion is often challenging and include issues such as cuda out of memory, payload size limit exceeded and so on. JumpStart simplifies this process by providing ready-to-use scripts that have been robustly tested. Furthermore, it provides guidance on each step of the process including the recommended instance types, how to select parameters to guide image generation process, prompt engineering etc. Moreover, you can deploy and run inference on any of the 80+ Diffusion models from JumpStart without having to write any piece of your own code.\n",
"\n",
"In this lab, you will learn how to use JumpStart to generate highly realistic and artistic images of any subject/object/environment/scene. This may be as simple as an image of a cute dog or as detailed as a hyper-realistic image of a beautifully decoraded cozy kitchen by pixer in the style of greg rutkowski with dramatic sunset lighting and long shadows with cinematic atmosphere. This can be used to design products and build catalogs for ecommerce business needs or to generate realistic art pieces or stock images.\n",
"\n",
"Notebook license: This notebook is provided under [MIT No Attribution license](https://github.com/aws/mit-0).\n",
"\n",
"Model lincese: By using this model, you agree to the [CreativeML Open RAIL-M++ license](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL).\n",
"\n",
"***"
]
},
{
"cell_type": "markdown",
"id": "5db28351",
"metadata": {},
"source": [
"1. [Set Up](#1.-Set-Up)\n",
"2. [Run inference on the pre-trained model](#2.-Run-inference-on-the-pre-trained-model)\n",
" * [Select a model](#2.1.-Select-a-Model)\n",
" * [Retrieve JumpStart Artifacts & Deploy an Endpoint](#2.2.-Retrieve-JumpStart-Artifacts-&-Deploy-an-Endpoint)\n",
" * [Query endpoint and parse response](#2.3.-Query-endpoint-and-parse-response)\n",
" * [Clean up the endpoint](#2.7.-Clean-up-the-endpoint)\n",
"3. [Conclusion](#3.-Conclusion)"
]
},
{
"cell_type": "markdown",
"id": "ce462973",
"metadata": {},
"source": [
"Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook instance with conda_python3 kernel."
]
},
{
"cell_type": "markdown",
"id": "9ea47727",
"metadata": {},
"source": [
"### 1. Set Up"
]
},
{
"cell_type": "markdown",
"id": "35b91e81",
"metadata": {},
"source": [
"***\n",
"Before executing the notebook, there are some initial steps required for set up. This notebook requires latest version of sagemaker and ipywidgets\n",
"To host on Amazon SageMaker, we need to set up and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook as the AWS account role with SageMaker access. \n",
"\n",
"***"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90518e45",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import sagemaker, boto3, json\n",
"from sagemaker import get_execution_role\n",
"\n",
"aws_role = get_execution_role()\n",
"aws_region = boto3.Session().region_name\n",
"sess = sagemaker.Session()"
]
},
{
"cell_type": "markdown",
"id": "310fca48",
"metadata": {},
"source": [
"## 2. Run inference on the pre-trained model\n",
"\n",
"***\n",
"\n",
"Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset.\n",
"***"
]
},
{
"cell_type": "markdown",
"id": "0e072e72-8bb4-4a8d-b887-2e9658dc3672",
"metadata": {},
"source": [
"### 2.1. Select a Model\n",
"***\n",
"You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at [Sagemaker pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#). For this lab, we recommend using the default `model_id`.\n",
"### 2.2. Retrieve JumpStart Artifacts & Deploy an Endpoint\n",
"\n",
"***\n",
"\n",
"Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [sagemaker.model.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. \n",
"\n",
"\n",
"### This may take upto 10 mins. Please do not kill the kernel while you wait.\n",
"\n",
"While you wait, you can checkout the [Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/) blog to learn more about Stable Diffusion model and JumpStart.\n",
"# Please use ml.g5.24xlarge instance type if it is available in your region. ml.g5.24xlarge has 24GB GPU compared to 16GB in ml.p3.2xlarge and supports generation of larger and better quality images.\n",
"This model also supports many advanced parameters while performing inference. They include:\n",
"\n",
"* **prompt**: prompt to guide the image generation. Must be specified and can be a string or a list of strings.\n",
"* **width**: width of the hallucinated image. If specified, it must be a positive integer divisible by 8.\n",
"* **height**: height of the hallucinated image. If specified, it must be a positive integer divisible by 8.\n",
"* **num_inference_steps**: Number of denoising steps during image generation. More steps lead to higher quality image. If specified, it must a positive integer.\n",
"* **guidance_scale**: Higher guidance scale results in image closely related to the prompt, at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.\n",
"* **negative_prompt**: guide image generation against this prompt. If specified, it must be a string or a list of strings and used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if prompt is a list of strings then negative_prompt must also be a list of strings. \n",
"* **num_images_per_prompt**: number of images returned per prompt. If specified it must be a positive integer. \n",
"* **seed**: Fix the randomized state for reproducibility. If specified, it must be an integer.\n",
"\n",
"***"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4fee71b1-5584-4916-bd78-5b895be08d41",
"metadata": {
"pycharm": {
"is_executing": true
},
"tags": []
},
"outputs": [],
"source": [
"import json\n",
"\n",
"# Training data for different models had different image sizes and it is often observed that the model performs best when the generated image\n",
"# has dimensions same as the training data dimension. For dimensions not matching the default dimensions, it may result in a black image.\n",
"# Stable Diffusion v1-4 was trained on 512x512 images and Stable Diffusion v2 was trained on 768x768 images.\n",
"Default response type above from an endpoint is a nested array with RGB values and if the generated image size is large, this may hit response size limit. To address this, we also support endpoint response where each image is returned as a JPEG image returned as bytes. To do this, please set `Accept = 'application/json;jpeg'`.\n",
"Writing a good prompt can sometime be an art. It is often difficult to predict whether a certain prompt will yield a satisfactory image with a given model. However, there are certain templates that have been observed to work. Broadly, a prompt can be roughly broken down into three pieces: (i) type of image (photograph/sketch/painting etc.), (ii) description (subject/object/environment/scene etc.) and (iii) the style of the image (realistic/artistic/type of art etc.). You can change each of the three parts individually to generate variations of an image. Adjectives have been known to play a significant role in the image generation process. Also, adding more details help in the generation process.\n",
"\n",
"To generate a realistic image, you can use phrases such as “a photo of”, “a photograph of”, “realistic” or “hyper realistic”. To generate images by artists you can use phrases like “by Pablo Piccaso” or “oil painting by Rembrandt” or “landscape art by Frederic Edwin Church” or “pencil drawing by Albrecht Dürer”. You can also combine different artists as well. To generate artistic image by category, you can add the art category in the prompt such as “lion on a beach, abstract”. Some other categories include “oil painting”, “pencil drawing, “pop art”, “digital art”, “anime”, “cartoon”, “futurism”, “watercolor”, “manga” etc. You can also include details such as lighting or camera lens such as 35mm wide lens or 85mm wide lens and details about the framing (portrait/landscape/close up etc.).\n",
"\n",
"Note that model generates different images even if same prompt is given multiple times. So, you can generate multiple images and select the image that suits your application best.\n",
"\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2886ac49",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"prompts = [\n",
" \"a beautiful illustration of a young cybertronic hyderabadi american woman, round face, cateye glasses, purple colors, intricate, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by wlop and artgerm and greg rutkowski and alphonse mucha, masterpiece\",\n",
" \"a photorealistic hyperrealistic render of an interior of a beautifully decorated cozy kitchen by pixar, greg rutkowski, wlop, artgerm, dramatic moody sunset lighting, long shadows, volumetric, cinematic atmosphere, octane render, artstation, 8 k\",\n",
" \"symmetry!! portrait of nicolas cage, long hair in the wind, smile, happy, white vest, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha\",\n",
" \"a stunningly detailed stained glass window of a beautiful poison ivy with green skin wearing a business suit, dark eyeliner, intricate, elegant, highly detailed, digital painting, artstation, concept art, sharp focus, illustration, art by greg rutkowski and alphonse mucha\",\n",
" \"a fantasy style portrait painting of rachel lane / alison brie / sally kellerman hybrid in the style of francois boucher oil painting unreal 5 daz. rpg portrait, extremely detailed artgerm greg rutkowski alphonse mucha\",\n",
" \"symmetry!! portrait of vanessa hudgens in the style of horizon zero dawn, machine face, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k\",\n",
" \"landscape of the beautiful city of paris rebuilt near the pacific ocean in sunny california, amazing weather, sandy beach, palm trees, splendid haussmann architecture, digital painting, highly detailed, intricate, without duplication, art by craig mullins, greg rutkwowski, concept art, matte painting, trending on artstation\",\n",
" display_encoded_images(generated_images, \"generated image with detailed prompt\")"
]
},
{
"cell_type": "markdown",
"id": "870d1173",
"metadata": {},
"source": [
"### 2.7. Clean up the endpoint\n",
"---\n",
"An endpoint deployed on SageMaker is persistent. So, once you are done using the model, please make sure to delete the model and the deployed endpoint.\n",
"\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63cb143b",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Delete the SageMaker endpoint\n",
"model_predictor.delete_model()\n",
"model_predictor.delete_endpoint()"
]
},
{
"cell_type": "markdown",
"id": "18a37c1c",
"metadata": {},
"source": [
"### 3. Conclusion\n",
"---\n",
"In this tutorial, we learnt how to deploy a pre-trained Stable Diffusion model on SageMaker using JumpStart. We saw that Stable Diffusion models can generate highly photo-realistic images from text. JumpStart provides both Stable Diffusion 1 and Stable Diffusion 2 and their FP16 revisions. JumpStart also provides additional 84 diffusion models which have been trained to generate images from different themes and different languages. You can deploy any of these models without writing any code of your own. To deploy a specific model, you can select a `model_id` in the dropdown menu in [2.1. Select a Model](#2.1.-Select-a-Model).\n",
"\n",
"You can tweak the image generation process by selecting the appropriate parameters during inference. Guidance on how to set these parameters is provided in [2.4. Supported Inference parameters](#2.4.-Supported-Inference-parameters). We also saw how returning a large image payload can lead to response size limit issues. JumpStart handles it by encoding the image at the endpoint and decoding it in the notebook before displaying. Finally, we saw how prompt engineering is a crucial step in generating high quality images. We discussed how to set your own prompts and saw a some examples of good prompts.\n",
"\n",
"To learn more about Inference on pre-trained Stable Diffusion models, please check out the blog [Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/)\n",
"\n",
"Although creating impressive images can find use in industries ranging from art to NFTs and beyond, today we also expect AI to be personalizable. JumpStart provides fine-tuning capability to the pre-trained models so that you can adapt the model to your own use case with as little as five training images. This can be useful when creating art, logos, custom designs, NFTs, and so on, or fun stuff such as generating custom AI images of your pets or avatars of yourself. To learn more about Stable Diffusion fine-tuning, please check out the blog [Fine-tune text-to-image Stable Diffusion models with Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/fine-tune-text-to-image-stable-diffusion-models-with-amazon-sagemaker-jumpstart/)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "0f2f8a7b",
"metadata": {},
"source": [
"## Notebook CI Test Results\n",
"\n",
"This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"